Syntactic Reordering as Pre-processing Step in Statistical Machine Translation of English to Sesotho sa Leboa and Afrikaans
نویسندگان
چکیده
The output quality of statistical machine translation (SMT) depends to a large extent on the quantity and quality of the parallel corpora on which it is trained. In the case of resource-scarce languages where sufficiently large parallel corpora are not always available, alternative ways of improving the output quality of SMT systems must be sought. In this article, one such a method for improving the quality of SMT output is described – introducing a pre-processing step via syntactic reordering of the source language data. This preprocessing involves exploiting certain systematic differences in the syntax of the source and target languages. Apart from describing the method and language-specific rules, we also evaluate the resulting machine translation systems for translating from English to Afrikaans and Sesotho sa Leboa (Sepedi).
منابع مشابه
Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation
Syntactic Reordering of the source language to better match the phrase structure of the target language has been shown to improve the performance of phrase-based Statistical Machine Translation. This paper applies syntactic reordering to English-to-Arabic translation. It introduces reordering rules, and motivates them linguistically. It also studies the effect of combining reordering with Arabi...
متن کاملDoes Syntactic Knowledge help English-Hindi SMT?
In this paper we explore various parameter settings of the state-of-art Statistical Machine Translation system to improve the quality of the translation for a ’distant’ language pair like English-Hindi. We proposed new techniques for efficient reordering. A slight improvement over the baseline is reported using these techniques. We also show that a simple pre-processing step can improve the qua...
متن کاملThe application of source language information in Chinese-English statistical machine translation
The quality of machine translation (MT) has been significantly improved by using statistical approaches. The integration of syntactic knowledge into a statistical MT system is still an open problem. This talk investigates the application of syntactic knowledge of the source language to the phrase-based MT system for translating Chinese into English. In this thesis, particular issues have been a...
متن کاملThe Lexicographic Treatment of the Demonstrative Copulative in Sesotho sa Leboa — An Exercise in Multiple Cross-referencing*
In this research article an in-depth investigation is presented of the lexicographic treatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves as an example to illustrate the so-called 'paradigmatic lemmatisation' of closed-class words in the African languages. The need for such an approach follows a discussion, in Sections 1 and 2 respectively, of the prese...
متن کاملClause Restructuring for Statistical Machine Translation
We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010